Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13599] [BUILD] remove transitive groovy dependencies from Hive #11449

Conversation

steveloughran
Copy link
Contributor

What changes were proposed in this pull request?

Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.

This stops the groovy classes and everything else in that uber-JAR from getting into spark-assembly JAR.

How was this patch tested?

  1. Pre-patch build was made: mvn clean install -Pyarn,hive,hive-thriftserver
  2. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
  3. A maven dependency tree was created mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt
  4. This text file examined to confirm that groovy was being imported as a dependency of org.spark-project.hive
  5. Patch applied
  6. Repeated step1: clean build of project with -Pyarn,hive,hive-thriftserver set
  7. Examined created spark-assembly, verified no org.codehaus packages
  8. Verified that the maven dependency tree no longer references groovy

Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded

@SparkQA
Copy link

SparkQA commented Mar 1, 2016

Test build #52249 has finished for PR 11449 at commit bc120b4.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Mind adding a Maven Enforcer rule so that the build will fail in case this is ever reintroduced?

@JoshRosen
Copy link
Contributor

Enforcer rules are also a nice chance to document why something is being excluded, etc.

@SparkQA
Copy link

SparkQA commented Mar 1, 2016

Test build #52250 has finished for PR 11449 at commit 7bedeec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

...let me work out how to do enforcer rules

@srowen
Copy link
Member

srowen commented Mar 2, 2016

OK by me. I'm still trying to work out whether it belongs in 1.6.2 or not -- thoughts?

@steveloughran
Copy link
Contributor Author

+1 for 1.6.x.

W.r.t 1.6.2, it'll keep the tar smaller, maybe even load faster. And eliminate the risk of a CVE.

if you set spark.authenticate=true untrusted callers can't submit malicious object streams via kryo, so there's less vulnerability

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52320 has finished for PR 11449 at commit 8b6f422.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

Added PR #11473 to cover only the <exclude> bit of the patch, as that is all that applies to 1.6.x

@rxin
Copy link
Contributor

rxin commented Mar 2, 2016

Why would we want to backport this into branch-1.6? We rarely update dependencies, unless there is a security problem.

@srowen
Copy link
Member

srowen commented Mar 3, 2016

Yeah, I'm on the fence; it seems low risk if it's really not used, but that also makes it low priority. Steve you're saying there's a potential security risk in Groovy -- is it purely a potential one, or do you have reason to believe there's an actual risk? You mentioned you definitely wanted to back port so I suspected there was a bit more motive in there

@steveloughran
Copy link
Contributor Author

The risk is deserialization; Groovy CVE-2015-3253 shows how groovy < 2.4.4 makes it straightforward to use a class in Groovy to run arbitrary shell commands on the destination. This has been show on Java ObjectStream and XStream, so assume Kryo is vulnerable too.

@steveloughran
Copy link
Contributor Author

I should that as well as the org.codehaus.groovy package, there's various shaded things in groovy/ and an unshaded copy of antlr. This may create versioning problems with the antlr.jar pulled in by spark-catalyst, though it'd depend on which version of the antlr classes got pulled in and on antlr's compatibility story.

@steveloughran
Copy link
Contributor Author

FWIW, We're backporting it in-house.Without it, downstream applications which try to add a secure groovy to their jobs won't know which version is picked up.

@steveloughran
Copy link
Contributor Author

Groovy and Xstream attack. Assume that you can do the same in Kryo, it just takes someone to sit down and do the work.

@rxin
Copy link
Contributor

rxin commented Mar 3, 2016

OK thanks for the information.

@rxin
Copy link
Contributor

rxin commented Mar 3, 2016

Merging this in master.

@asfgit asfgit closed this in 9a48c65 Mar 3, 2016
@steveloughran
Copy link
Contributor Author

thx

asfgit pushed a commit that referenced this pull request Mar 7, 2016
…-hive and spark-hiveserver (branch 1.6)

## What changes were proposed in this pull request?

This is just the patch of #11449 cherry picked to branch-1.6; the enforcer and dep/ diffs are cut

Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.

This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.

## How was this patch tested?

1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
1. Patch applied
1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
1. Examined created spark-assembly, verified no org.codehaus packages
1. Verified that the maven dependency tree no longer references groovy

The `master` version updates the dependency files and an enforcer rule to keep groovy out; this patch strips it out.

Author: Steve Loughran <stevel@hortonworks.com>

Closes #11473 from steveloughran/fixes/SPARK-13599-groovy+branch-1.6.
@steveloughran steveloughran deleted the fixes/SPARK-13599-groovy-dependency branch March 8, 2016 16:16
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR.

This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR.

## How was this patch tested?

1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver`
1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs
1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver  -Dverbose > target/dependencies.txt`
1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive`
1. Patch applied
1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set
1. Examined created spark-assembly, verified no org.codehaus packages
1. Verified that the maven dependency tree no longer references groovy

Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded

Author: Steve Loughran <stevel@hortonworks.com>

Closes apache#11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants